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How can experts, sometimes in exacting detail, almost immediately and very precisely 
recall memory items from a vast repertoire? The problem in which we will be interested 
concerns models of theoretical neuroscience that could explain the speed and robustness 
of an expert's recollection. The approach is based on Sparse Distributed Memory, which 
has been shown to be plausible, both in a neuroscientific and in a psychological manner, in 
a number of ways. A crucial characteristic concerns the limits of human recollection, the 
"tip-of-tongue" memory event — which is found at a non-linearity in the model. We expand 
the theoretical framework, deriving an optimization formula to solve this non-linearity. 
Numerical results demonstrate how the higher frequency of rehearsal, through work or 
study, immediately increases the robustness and speed associated with expert memory. 

Keywords: sparse distributed memory, non-linearity, critical distance, thieoretical neuroscience, expert memory 



1. INTRODUCTION 

Szilard told Einstein about the Columbia secondary-neutron exper- 
iments and his calculations toward a chain reaction in uranium 
and graphite. Long afterward [Szilard] would recall his surprise 
that Einstein had not yet heard of the possibility of a chain reac- 
tion. When he mentioned it Einstein interjected, "Daranhabe ich gar 
nicht gedacht!"-"! never thought of that!" He was nevertheless, says 
Szilard, "very quick to see the implications and perfectly willing to do 
anything that needed to be done" 

— July 16, 1941, meeting between Leo Szilard and Albert 
Einstein concerning atomic weapons (Rhodes, 2012, p. 305). 

How can experts — like Albert Einstein — immediately find 
meaning given very few cues? How can experts — like Leo 
Szilard — recollect, sometimes in exacting detail, memories that 
non-experts would find baffling? These abilities span wide across 
the spectrum of human activity: From full chess games played 
decades ago, to verses written by Dante, to exotic wines, or to 
the script and actors involved in movie scenes, experts can almost 
immediately and very precisely recall from a vast repertoire. How 
can neuroscience explain the speed and robustness of experts' 
recollection? 

The work done herein can be related to the work done 
by Shepard (1957) and further developed by Nosofsky (1986); 
Shepard (1987) in the sense that the models investigated here 
use conceptual approximation and distancing in what could be 
considered a psychological space. However, this work does not 
aim to continue these authors' approaches to identification, cat- 
egorization, similarity and psychological distance. Here we aim 
at discovering the bounds and limits of conceptual retrieval 
in human memory via the Sparse Distributed Memory (SDM) 
proxy. 

Recently, Abbott et al. (2013) explored a computational level 
(as defined by Marr, 1892) account of SDM as a model of 



inference. We provide here an initial exploration that may further 
the work done by these authors, providing a theoretical founda- 
tion for a computational account of the edges of recollection via 
Sparse Distributed Memory (and possibly other architectures, by 
means of the connectionist common-ground). 

Other approaches that are neurally plausible could include the 
template and chunk theory by Gobet et al. (Gobet and Simon, 
2000; Gobet et al, 2001; Harre et al, 2012; Harre, 2013). Chunks 
are stored memory items, and templates include slots in which 
items can vary. 

Recent findings by Huth et al. (2012) suggest that human 
semantic representation resides in a continuous psychological 
space. The authors provide evidence in the form of fMRI results 
supporting that human semantic representation resides in a 
continuous multidimensional space. The SDM model explored 
herein is consistent with these findings in that SDM permits hier- 
archical relationships between concepts, and instantiates a multi- 
dimensional conceptual space which holds attractors to memory 
items that are, in fact, continuous (as a function of their distance 
from the reading point). 

Two of the concepts with which we will deal here are reflected 
in this 1941 meeting: information content, shown by Einstein's 
surprise involved in unexpected information; and the ability to 
rapidly access memory, in detail, shown by Szilard's "long after- 
ward" recollection of the meeting. A third concept we will use is 
that evidence points toward memory being organized around cell 
assemblies, and Sparse Distributed Memory takes advantage of 
this concept. 

2. CELL ASSEMBLIES AND THE SPARSE DISTRIBUTED 

MEMORY MODEL 
2.1. CELL ASSEMBLIES 

How is information encoded in the brain? We postulate that 
information is encoded by cell assemblies, not by individual 
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neurons (Sakurai, 1996, 1998, 1999). There are at least five rea- 
sons leading to this position. (1) Neurons constantly die — yet 
the brain is robust to their loss. (2) There is large variability 
in the activity of individual neurons — as would be expected on 
anatomical and physiological grounds alone. (3) A single neu- 
ron does not participate in a single function; as Sakurai (1998) 
puts it: 

Even the famous "face neurons" in the temporal cortex do not 
respond to single unique faces but to several faces or to several 
features comprising the several faces, (p. 2) 

(4) Studies of activity correlation between neighboring neurons 
show very low, if not zero, correlation. (5) Finally, while the num- 
ber of neurons is quite large, it is minute in comparison with 
the different combinations of incoming stimuli one experiences 
during one's lifetime. 

Furthermore, recent literature suggests the connection 
between the increased activation of the fusiform face area (FFA) 
and the acquisition of expertise (Gauthier et al., 1999; Xu, 2005; 
McGugin et al., 2012). Current results hold strong evidence 
that FFA activation is correlated with domain-specific expertise 
in naturalistic settings (BUalic et al., 2011b). Additionally, it 
is shown that expertise in object-recognition tasks modulates 
activation in different areas of the brain (Bilalic et al., 2011a), 
including homologous right-left hemispheric activation in both 
object and pattern recognition expertise (Bilalic et al, 2010, 
2012). This evidence and the preceding points serve to further 
emphasize the distributed role of activation in recognition and 
expertise. 

Hence we subscribe to the hypothesis that the unit of infor- 
mation encoding is not the individual neuron, but groups of 
neurons, or cell assemblies (Sakurai, 1996, 1998, 1999). In this 
model, shown in Figure 1, a single neuron may participate in a 
large number of assemblies, and the possible number of assem- 
blies is enormous. Cell assemblies, rather than being encumbered 
by such combinatorial explosions, actually take advantage of 
them, as we wiU see in Sparse Distributed Memory. 

2.2. SPARSE DISTRIBUTED MEMORY 

A promising research programme in theoretical neuroscience is 
centered around Sparse Distributed Memory, originally proposed 



by Kanerva (1988). SDM is a neuroscientific and psychologically 
plausible model of human memory. 

2.2. 1. A large space for memory items 

SDM introduces many interesting mathematical properties of 
n-dimensional binary space that, in a memory model, are psy- 
chologically plausible. Most notable among these are robustness 
against noisy information, the tip-of-the-tongue phenomenon, 
conformity to the limits of short-term memory (Linhares et al., 
2011), and robustness against loss of neurons. The model has 
been explored in the study of vision and other senses (Olshausen 
et al, 1993; Laurent, 2002; Rao et al, 2002; Mazor and Laurent, 
2005). In spite of the increasing number of neuroscientists dis- 
playing interest in Sparse Distributed Memory (Ballard et al, 
1997; French, 1999; Ludermir et al, 1999; SUva et al., 2004; 
Laurent, 2006; Bancroft et al., 2012), we still have limited 
understanding of its properties. 

As in some other neuroscientific models, inhibitory and exci- 
tatory signals are represented in binary form. In SDM, both the 
data and the storage space belong to {0, 1}", hence a particular 
memory item is represented by a binary vector of length «, hence- 
forth called a bitstring. These binary bitstrings are stored (as with 
most computational memory models) in addresses. In SDM, these 
also take the form of «-dimensional binary vectors. 

The distance between two bitstrings is calculated using the 
Hamming distance. Hamming distance is defined for two bit- 
strings of equal length as the number of positions in which the bits 
differ. For example, 00110;, and 01100^ are bitstrings of length 5 
and their Hamming distance is 2. 

The size of the {0, 1}" address space grows exponentially 
with the number of dimensions n; i.e., N = 2". While Kanerva 
(1988) suggests « between 100 and 10, 000, recently he has pos- 
tulated 10, 000 as a desirable minimum n (Kanerva, 2009). This 
is, of course, an enormous space, unfeasible to be physically 
implemented. 

To solve the feasibility problem of implementing this mem- 
ory, SDM takes a uniformly distributed random sample of {0, 1}", 
having N' elements, and instantiates only these points of the 
space. These instantiated addresses in the sample are called hard 
locations and each hard location implements a set of « counters, 
which we will see in more detail. The hard locations allow SDM 
to use the entire (virtual) {0, 1}" space through distributed read 




FIGURE 1 I Cell assemblies: the information encoded in a single neuron is negligible and fragile. Multiple neurons may provide the best description of 
human information processing. 
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and write operations (described in more detail below). A ran- 
dom bitstring is generated with equal probability of O's and I's 
in each dimension. Thus, the average distance between two ran- 
dom bitstrings has a binomial distribution with mean \i = njl 
and standard deviation a = For large n, the vast majority 

of the space lies "close" to the mean (i.e., between fi — 3a and 
II + 3a) and has few shared hard locations: as n grows, two bit- 
strings with distance far from n/2 are very improbable. We define 
two bitstrings to be orthogonal when their distance is close to n/2. 

Figure 2 provides a simplified view of the model, with a small 
space for hard locations and a large space for possible locations. 
The model instantiates a random sample of about one million 
hard locations — which is in fact, a minute fraction of the space: 
for H = 100, only 100 • 10^2^°° = 7 ■ 10"^^ percent of the whole 
space "exists" (i.e., is instantiated), and for n = 1000 only 100 • 
IO6/2IOOO ^ 7 . 10-294 percent. 

2.2.2. Creating a cell assembly by sampling tbe space at/i—Sa 

The activation of addresses takes place according to their 
Hamming distance from the datum. Suppose one is writing 
datum rj at address then all addresses inside an n-dimensional 
circle with center ^ and radius r are activated. So, 11 will be stored 
in all of these activated addresses, which are around address as 
shown in Figure 3. An address is inside the circle if its hamming 
distance to the center ^ is less than or equal to the radius r, i.e., 
distancei^, < r. Generally, r = ji — 3a. The radius is selected 
to activate, on average, 1/ 1000th of the sample, that is, approx- 
imately 1000 hard locations for a model with one million hard 
locations. To achieve this, a 1000-dimension memory uses an 
access radius r = 451, and a 256-dimensional memory, r = 103. 
This will generate a cell assembly to either store or retrieve a 
memory item. With this activation mechanism, SDM provides a 
method to write and read any bitstring in the {0, 1}" space. 

2.2.3. Writing an item to tbe memory 

Table 1 shows an example of a write operation being performed 



in a 7-dimensional memory. 
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One way to view the write and read operations is to visualize 
neurons (hard locations) as vectors, that is vectors pointing to cer- 
tain areas of the space. In the SDM model, the cell assembly (i.e., 
the set of active hard locations) work in unison, rather like a sum 
of vectors: as one writes bitstrings in memory, the counters of the 
hard locations are updated. 

When a bitstring activates a set of hard locations, the active 
hard locations do not individually point to the bitstring that acti- 
vated them, but, taken together, they point to a coordinate in 
space (that is, the bitstring that activated them). In this fashion, 
any one hard location can be said to simultaneously point to many 
different areas of the space, and any point in space is represented 
by the set of hard locations it activates. 

In other words, both reading and writing depend on many 
hard locations to be successful. This effect is represented in 
Figure 4: where all hard locations inside the circle are activated 
and they, individually, do not point to r). But, as vectors, their sum 
points to the general direction of 77. If another datum v is written 
into the memory near the shared hard locations will have infor- 
mation from both of them and would not point (directly) to 
either. All hard locations, inside and outside of the circle, may also 
point elsewhere to other additional data points: as we have seen, 
even "face" neurons have multiple functions. 

The write operation works as follows: Suppose one is writ- 
ing datum T] at address §: then all hard locations inside an 
n-dimensional circle with center § and radius r are activated. So, 
77 will be stored in all these activated addresses, which are close 
to address An address ^' is inside the circle if its hamming 
distance to the center ^ is less than or equal to the radius r, i.e.. 




FIGURE 3 I Activated addresses inside access radius r around center 
address. 



Table 1 | Write operation example in a 7-dimensional memory of data 
1] being written to |, one of the activated addresses. 
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distancei^, < r. The information will be written to the entire 
cell assembly: thus, all hard locations within the circle will be 
updated. 

Each hard location has both an address (given by its bitstring) 
and a value. The value is stored in counters. Each hard location 
has one counter for each dimension in the space. Each counter 
stores, for its dimension, the bit value that has been written 
more frequently (O's or I's) to its hard location. So each counter, 
corresponding to each dimension, is incremented for each bit 
1 and decremented for each bit 0 written to that hard location. 
Thus, if the counter is positive, the hard location has had more 
I's than O's written to it, if the counter is negative, more O's than 
I's, and if the counter is zero, there have been an equal number of 
I's and O's written to that particular dimension in that particular 
hard location. 

Each datum rj is written into the counters of every activated 
hard location inside the access radius, centered on the address § 
that equals the datum: ^ = If some neurons are lost, only a 
fraction of the datum is lost, and the memory remains capable 
of retrieving the right datum due to the high redundancy of the 
model. 

2.2.4. Reading an item from memory 

Table 2 illustrates a read operation over a 7-dimensional memory. 

The read operation is performed by polling each activated hard 
location and choosing the most-written bit for each dimension. 
A hard location is considered activated if it is within a hamming 




FIGURE 4 I Hard-locations pointing, approximately, to the target 
bitstring. 



Table 2 | Read operation example. 
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distance (radius) of the activating bitstring cue. Activated hard 
locations are taken into account in calculating the result of a read 
operation, while others are ignored. Reading consists of adding 
all n counters from the activated hard locations and, for each bit, 
setting it to 1 if the counter is positive, setting it to 0 if the counter 
if negative, and randomly setting it to 0 or 1 if the counter is zero. 
Thus, each bit of the returned bitstring is chosen according to all 
written bitstrings in the entire cell assembly (i.e., all active hard 
locations) and is equal to the bit value most written in that dimen- 
sion. In short, the read operation depends on many hard locations 
to be successful. If another datum v is written into the memory 
near rj, the shared hard locations will have information from both 
of them without directly pointing to v either. In this way, any 
one hard location may, in a fashion, simultaneously "point" to 
multiple addresses. 

An imprecise cue shares hard locations with the target bit- 
string Tj — yet it should be possible to retrieve rj correctly, even 
if additional reading operations become necessary to retrieve r] 
exactly. When reading a cue rj^ that is x bits away from 77, the cue 
shares many hard locations with 77 (see Figure 5). The number 
of shared hard locations decreases as the distance of the cue to 
r] increases, in other words, as x= d{rix, rj) increases. The tar- 
get datum )7 is read in all addresses shared between rj and rjx, 
thus they will bias the read output toward the direction of rj. 
If the cue is sufficiently close to the target datum rj, the output 
of the read operation will be closer to rj than rjx originally was. 
Iterating the read operation will obtain results increasingly closer 
to rj, until it is exactly the same. So rixs) will yield an rj^i that is 
closer, reading at rj^i yields an rj^i that is closer still and so on until 
rjxi = rj, if the iteration converges. Hence, performing a sequence 
of successive read operations will allow convergence onto the 
target data rj. 

Since a cue rjx near the target bitstring 77 shares many hard 
locations with rj, SDM can retrieve data from imprecise cues (i.e., 
as an autoassociative memory). In spite of this characteristic, it 
is crucial to know how imprecise this cue could be while still 
converging. What is the maximum distance from our cue to the 
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original data that still retrieves the right answer? There is a pre- 
cise point in which a non-linearity occurs, and the qualitative 
behavior of the model changes. 

A striking feature of this model is its reflection of the psycho- 
logical "tip-of-tongue" phenomenon, which seems to reflect the 
limits of human recollection. It is the psychological state in which 
one knows that one knows some pre-registered memory item, yet 
one is unable to recollect it at a given time. 

The tip-of-the-tongue phenomenon occurs when a person 
knows that he/she has been previously exposed to a certain stimu- 
lus, but is unable to recall some specifics. In SDM, a tip-of-tongue 
memory event occurs when the expected time to convergence 
(or divergence) approaches infinity. In other words, when succes- 
sive read iterations fail to converge or to diverge. Kanerva (1988) 
called this particular instance of x, where the output of the read 
operation averages x, the critical distance. Intuitively, it is the dis- 
tance from which smaller distances converge and greater distances 
diverge. In Figure 6, the circle has radius equal to the critical dis- 
tance and every jy^ inside the circle should converge. The figure 
also shows an example of convergence in four readings. We put 
that this is a proxy for the edge of human recall: a threshold until 
which recollection occurs, and beyond which it no longer occurs. 

Kanerva describes this critical distance as the threshold of con- 
vergence of a sequence of read words. It is "the distance beyond 
which divergence is more likely than convergence" (Kanerva, 
1988). Furthermore, "a very good estimate of the critical distance 
can be obtained by finding the distance at which the arithmetic 
mean of the new distance to the target equals the old distance to 
the target" (Kanerva, 1988). 

Kanerva has analytically derived this non-linearity for a very 
particular set of circumstances. His original book analyzed a spe- 
cific situation with « = 1000 (N = 2'°°°), 1,000,000 hard loca- 
tions, an access-radius of 451 (with 1000 hard locations in each 
circle) and 10,000 writes of random bitstrings in the memory. 




FIGURE 6 I In this example, four iterative readings are required to 
converge from i/x to >/. 



This is a very particular set of parameters, and doesn't shed light 
on questions of speed and robustness of expert recollection. In 
the next section we deal with this non-linearity and the issue of 
analyzing critical distance as an optimization problem. 

In subsequent sections, we will derive an equation for the crit- 
ical distance, in terms of SDM's parameters. We will then present 
empirical results of the evolution of the critical distance under 
varying conditions,which shed light on the model's behavior. It 
is worth noting that, since SDM is itself a computer simulation, 
what we call empirical results refer to conclusions obtained over 
data from thousands of runs of the simulation. All data and con- 
clusions (aside from theory) herein refer to trials over computer 
simulations. 

3. MATERIALS AND METHODS 

3.1. DERIVING THE CRITICAL DISTANCE AS A MINIMIZATION 
PROBLEM 

Kanerva has shown that, when 10, 000 items are stored in the 
memory, and the number of dimensions N = 1000, then the crit- 
ical distance is at a Hamming distance of 209 bits: if one reads 
the item at a distance smaller than 209 bits, one is able to itera- 
tively converge toward the item. If, on the other hand, one reads 
the item at a distance higher than 209 bits, the memory can- 
not retrieve the item. Furthermore at the juncture of about 209 
bits, expected time to convergence grows to infinite. This reflects 
the aforementioned tip-of-the-tongue phenomenon: when one 
knows that one knows a particular bit of knowledge, yet is unable 
to retrieve it at that point. Psychologically, this would entail some 
top-down mechanism which would force the iterated search to 
halt. We establish a maximum number of iterated reads, based on 
repeated simulations (see section 4.2). 

Kanerva thus fixed a number of parameters in order to derive 
this mathematical result: 

1. the number of dimensions, N = 1000; 

2. the number of other items stored into the memory, at 10, 000; 

3. the reading method (by pooling all hard locations); 

4. a single write of the target bitstring in the memory; 

5. the access radius of 451, activating approximately 1000 hard 
locations per read or write operation. 

As Kanerva defined it, approximately half of read operations 209 
bits away from the target data wUl bring us closer to the target and 
approximately half will move us away from the target. His math 
could be simplified to this: each item will activate approximately 
1000 hard locations, so writing 10, 000 items randomly will acti- 
vate a total of 10, 000, 000 hard locations, giving an average of 10 
different bitstrings written in each hard location. When one reads 
from a bitstring 77200. 200 bits away of the target r], 77200 will share 
a mean of 97 hard locations with the target (Kanerva, 1988, Table 
7.1, p. 63). This way, it is possible to split the set of active hard 
locations into two groups: one group having 903 hard locations 
with 10 random bitstrings written into each; and other group 
having 97 hard locations each with 9 random bitstrings plus our 
target bitstring rj. 

Let us analyze what happens to each bit of the read bitstrings. 
To each bit we have 903 • 10 -|- 97 • 9 = 9903 random bits out of a 
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total of 10, 000 bits. The total number of 1-bits is a random vari- 
able that follows the Binomial distribution with 9903 samples and 
p = 0.5.Ithasameanof9903/2 = 4951.5 and standard deviation 
-^9903/4 = 49.75. If our target bit is 0 we will choose correctly 
when our sum is less than half total, or 10, 000/2 = 5000. If our 
target bit is 1, our sum is the random variable of total 1-bits added 
by 97 1-bits from our sample. Adding a constant number changes 
only the mean and does not affect the standard deviation. So 
we will choose correctly when our sum of means 4951.5 -|- 97 = 
5048.5 and standard deviation 49.75 is greater than 5000. Both 
probabilities here equal 83% of choosing the same bit as the tar- 
get. As we have 1000 bits, in average, we can predict that the result 
of the read operation will be 170 bits away from the target. 

The critical distance is the point where the aforementioned 
probability equals the distance from the bitstring rjx to the tar- 
get J], or X = n{l — p), where x is the distance from the bitstring 
to the target, p is the probability of choosing the wrong value 
of a bit (given by the above technique), and n is the number of 
dimensions. 

Given that we intend to study the critical distance as a theo- 
retical proxy for the limits of human recollection, we would like 
to explore a larger number of possibilities and parameter settings 
of the model. Hence we compute the non-linearity of the critical 
distance as minimization problem. Let: 

• d: be the distance to the target; 

• h: be the number of hard locations activated during read and 
write operations (note that this value depends on that access 
radius); 

• s: be the number of total stored bitstrings in the SDM; 

• H: be the number of instantiated hard locations; 

• w: be the number of times the target bitstring was written in 
memory; 

• 6*: be the total of random bitstrings in all h hard locations 
activated by a read operation; i.e., the size of a cell assembly; 
and 

• <p{d): be the mean number of shared hard locations activated by 
two bitstrings d bits away from each other. One can find values 
for a 1000-dimensional SDM in Kanerva's book, Table 7.1, p. 
63, or the equations to calculate to any SDM in (de Padua Braga 
and Aleksander, 1995; Kanerva, 1988, Appendix B, p. 125). 

Consider a memory in which a total of s bitstrings have already 
been stored via write operations. Each of these write operations 
would have activated approximately h hard locations. This way, 
on average, all write operations together activate a total of sh hard 
locations. This gives an average of sh/H random bitstrings stored 
in each hard location. 

Knowing the average number of bitstrings stored in each hard 
location, it is simple to find an equation for 6*. Each read operation 
performed for a cue 77^ has <p(d) hard locations shared with the 
target bitstring 77, and h — 4>(d) non-shared hard locations. The 
non-shared hard locations have only random bitstrings stored in 
themselves. However, the shared hard locations have the target 
bitstring written w times, resulting in fewer random bitstrings. As 
the average number of bitstrings written in each hard locations is 



sh/H, we have: 

s-h /s-h \ 

e = —.[h- 4,{d)] + [jj- - wj • m 
e = w- 0(d 

H 

Suppose the k-th bit of our target bitstring is zero. The read oper- 
ation will correctly choose bit 0 if, and only if, more than half of 
the bitstrings from the activated hard locations has the k-th bit 
0 (setting aside the case of an equal number of zeros and ones'). 
As each hard location has sh/H bitstrings and the read operation 
activates h, half of the bitstrings equals h ■ sh/(lH) = sh^/{2H). 
Then, to choose correctly, we should have X!f= 1 < sh^/i2H), 
where X, is the k-th bit of the ;-th bitstring stored in each activated 
hard location. 

Suppose the fc-th bit our target bitstring is 1. The read opera- 
tion wiU choose bit 1 when more than half of the bitstrings from 
the activated hard locations has the fc-th bit 1. We have already 
seen that half of the bitstrings is sh^/i2H). But here, as the bit 
equals 1 and there are w target bitstrings in each (p(d),we have to 
add w ■ (p[d) to the sum. In other words, we must account for the 
number of times the target was written into the hard locations 
which are activated by both the target and the cue which is at a 
distance d. This gives us w ■ 4>{d) 4- X!f= 1 > sh^/(2H). 

Summarizing, we have: 

/ ^ sh^\ 
P(wrong\bit = 0) = 1 - P \^X, < —j 

/ 0 s7,2 \ 

P(wrong\bit = I) = P IJ^X, < — - w (pid) 1 

We already know that P(X, = 1) = PiX, = 0) = 1/2. Since each 
Xi corresponds to a Bernoulli trial, X!f= 1 ~ Binomial{6; 0.5), 
which has mean 6 /2 and standard deviation ^0/4. 

The critical distance is the distance where the chance of con- 
vergence to the target equals the distance of divergence from the 
target. That is, in the critical distance, the probability of a wrong 
choice of the bit, times the number of bits, is equal to the origi- 
nal distance to the target. Then, the critical distance is the d that 
satisfies equation P(wrong) ■ n = d or P(wrong) = d/ n. 

Using the theorem of total probability, we have: 

P(wrong) = P{wrong\hit = 0) • P{hit = 0) 

-\- Piwronglbit = 1) • P{bit = 1) 

if we let 



'the case for a random coin toss is negligible, since, as 9 becomes large, its 
probability tends toward 0 quickly. 
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and 



fe(d) 



p = P ^J2x, < — - w 4>id) 



thus, 



Piwrong) = -■[(l-a) + P] 

This way, the equation to be solved is: 

1 d 

-•[(!-«) + /;] = - 

2 n 

Since d is an integer value and 0 is a function of d, this equality 
may not be achievable (this describes a range, where for a cer- 
tain d: leftside > rightside and for d + I: leftside < rightside). In 
these cases, the critical distance can be obtained minimizing the 
following equation with the restriction of d e N and d < n: 



fid) 



1 d\ 
-•[(!-«) + /}]-- 

2 n 



If the size of the cell assembly, 9, is large enough, a good approx- 
imation to the Binomial{0; 0.5) is the normal distribution. Let 
N be the normalized normal distribution with mean zero and 
variance one. We have: 



a ~ N I z < 



Nlz < 



sh^/{2H)-6/2\ 
; = a 

sh^/{2H)-w-(p{d)-e/2\ 



Simplifying, we have: 



Niz < 



Nlz< 



sh^/{2H) - 0/2 



sh^/i2H)-w-4>{d)-9/2 



Niz < 



4>(d) \ 



N{z< 



■sfe ) 

-w-(P{d) \ 

Ve J 



And we have to minimize the following function with restrictions 
of c? G N and d < n: 



fid) 



In the case studied in Kanerva (1988), n = 1000, h= 1000, 
H = 1, 000, 000, s = 10, 000, w = 1, and e* = 10, 000 - 0(d). 
Replacing these values in the equation, we have to minimize: 



fid) 



1000 



When d = 209, we have 0(d) = 87 and/(209) = 0.00032, which 
is the global minimum. 



We note once again that equations to calculate 0(d) have been 
derived in Kanerva (1988, Appendix B) and need not be repeated 
here — see also the derivations for higher d by de Padua Braga and 
Aleksander (1995). The example calculated above used Table 7.1 
of Kanerva (1988), which has the values of 0(d) for d in a 1000- 
dimensional SDM with one million hard locations. 

In the following section we briefly reiterate and discuss the 
contribution of the theoretical model, and turn to empirical 
results pertaining to the exploration of the critical distance. 
We vary parameters of the memory model in order to explore 
the changes to the critical distance. These empirical trials yield 
enlightening results pertaining to the critical distance as a paral- 
lel for the edge of human recollection and for human expertise 
in SDM. 

4. RESULTS 

In this text, we show that, given a, fi, and d, minimizing the 
function (repeated here from the previous derivation): 



fid) 



solves the issue of non-linearity involved in the critical distance of 
the model, that is, the psychological limits of human recollection 
at a given point in time. Such result should be valuable to assess 
whether the memory is prone to convergence or divergence. 

This result may help provide avenues of exploration in theoret- 
ical neuroscience and can be readily available to cognitive model- 
ers. Yet, it still falls short of giving us an intuitive understanding of 
the speed and robustness of the memory of experts. Therefore, we 
will explore the critical distance behavior at different configura- 
tions. We have implemented the model and conducted a large set 
of computational experiments, whose visualizations illuminate 
the issue of expert memory. 

4.1. NUMERICAL SIMULATIONS: VISUALIZING THE MEMORY 
DYNAMICS 

So far we have seen a single particular case with set parameters, 
and our goal is to understand the speed and robustness of expert 
memory. Let us consider variations of these parameters, and com- 
pute, through simulations, the behavior of the critical distance. 
We vary the number of dimensions N e {256, 1000}, we vary the 
number of stored items from the set {1000, 2000, . . . , 50000}, and 
we vary the rehearsal number: the number of times an item has 
been stored in the memory. 

The following figures depict heat maps describing the behav- 
ior of the critical distance. In these simulations, all items are 
stored at their respective locations, that is, a bitstring x is 
stored at the location x. Generating each heat map proved 
computationally demanding: when N = 1000, approximately 
305, 000, 000, 000, 000 bit-compares are required (storage of 
items in memory: 5 • lO'^, and to read items from memory: 
3 • 10^^ bit-compares). Each individual pixel demands an average 
of 7, 000, 000, 000 bit-compares. 

All figures presented below have three colored lines. The green 
line marks the first occurrence of non-convergence to the exact 
target bitstring. The red line marks the last occurrence of the 
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convergence to the exact target bitstring. Finally, the blue line 
marks the estimated critical distance, that is where the read output, 
on average, equals the input distance to the target bitstring. It is an 
estimation because the critical distance is not exactly defined this 
way. Critical distance is the point or region in which both diver- 
gence and convergence have a 50% chance to occur. That is, all 
points before the green line converged, all points after the red line 
diverged, and the points between these lines sometimes converge 
and sometimes diverge. 

One may notice that, despite not having an exact convergence, 
almost all points between the green and the red line are near the 
target bitstring. 

4.2. INFLUENCE OF ITERATIVE READINGS IN CRITICAL DISTANCE 

The number of iterative-readings is an important parameter of 
an SDM implementation. Simulations were done in a 1000 and 
256-dimensional SDM. Both with one million hard locations, 
activating (on average) 1000 hard locations per operation and 
varying the number of times the target bitstring r) is written to 
memory. 

For each write-strength of (written once, twice, five times, 
nine times) we varied the saturation of the SDM, that is, the num- 
ber of random bitstrings written (once each) along with eta in the 
memory. We varied this from 1000 to 50, 000 random bitstrings, 
in increments of 1000. Once populated with eta plus the random 
bitstrings, we performed 1-40 iterative-readings at each possible 
distance from the target (from zero to the number of dimensions). 

Figures 7A-D show, respectively, a 1000-dimensional SDM 
checked with a single read, 6, 10, and 40 iterative-readings. It is 
easy to see a huge difference from a single read to more reads, 
but a small difference from 6 to 10 and from 10 to 40 iterative- 
readings. These observations also apply to our tests with the 
256-dimensional SDM. As compared to the 1000-dimensional 
SDM, we found a smaller, more gradual difference from a sin- 
gle read to more reads, yet a minute difference from 6 to 10 
and from 10 to 40 iterative-readings. Following these results, 
due to the number of computations needed in each simulation, 
all other simulations were done using 6 iterative-readings, since 
40 iterative-readings have only a slight improvement in relation 
to six. 

It is unexpected that, after 40, 000 writes in the 1000- 
dimensional memory, the critical distance is so small. Kanerva 
(1988) showed that, under these parameters, the memory capac- 
ity is slightly less than 100, 000 items. The author defines SDM 
capacity as saturated when its critical distance is zero. In the 256- 
dimensional memory, this behavior starts after 20, 000 writes. 
This is unexpected, since Kanerva's estimation for N = 256 is 
between 112, 000 and 137, 000 random bitstrings stored. 

Our principal hypothesis for the discrepancies between our 
empirical results and the original theory is that, while the hard 
locations are instantiated as samples from a uniform distribution 
and our simulations wrote bitstrings randomly, they do not satu- 
rate uniformly. Any write activates a fixed average (around 1000 
in our case) of hard locations, but the variance in this case is not 
insignificant. One bitstring read may activate 900 while another 
(in another area of the space, be it close or far) may activate 1100 
hard locations. Thus, certain hard locations would become more 



y\ Critical Distance Behavior 




original distance 

1000-dimensional SDM, 1 write of target 
and single read 



B Critical Distance Betiavior 




lOOO-dimensional SDM, 1 write of target 
and 6 iterative-readings 



C Critical Distance Behavior 




original distance 



1000-dimensional SDM, 1 write of target 
and 10 iterative-readings 




1000-dimonsionaI SDM, 1 write of target 
and 40 iterative-readings 



FIGURE 7 I Influence of number of iterative-readings in a 1000-dimen- 
sional SDM memory. 
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noise than signal during activation sooner, rather than a uni- 
form degradation occurring. This discrepancy would cause, in the 
aggregate, a saturation of the SDM with fewer bitstrings stored 
than expected in theory. This remains one possibility, though we 
hope the issue will be explored in future work. 

4.3. INFLUENCE OF THE NUMBER OF WRITES ON THE CRITICAL 
DISTANCE 

The influence of the number of writes on the critical distance was 
not analyzed by Kanerva. It is important because, when a random 
bitstring is seen only once, it is psychologically plausible that it 
will be gradually forgotten with new incoming information. What 
matters is not exactly the number of writes, but the proportion of 
the number of times a bitstring was stored in relation to others. 

A remark on cognitive psychology is in order here. Consider, 
as an example, the aforementioned exchange between SzUard 
and Einstein. As an expert confronts unexpected information, 
it is reasonable to expect that additional memory writes will 
occur. If we presume that evolution brought the human memory 
close to optimality, as explored by the rational analysis approach 
(Anderson and Milson, 1989; Anderson, 1990; Anderson and 
Schooler, 1991), one would expect some mechanism akin to 
Shannon's idea of information content to be in play. 

That is, as an expert is surprised by new, unforeseen, informa- 
tion, say, an outcome 0, with information 7(0) = —log(P(&)), 
where I stands for the information content in outcome 0. One 
would therefore expect the expert's memory to either place addi- 
tional attention to the outcome, leading to: (1) additional writes 
to memory, or (2) amplification of the write operation's signals, 
or possibly (3) both effects. 

Figures 8A-D show a 1000-dimensional SDM with 1, 2, 5, and 
9 writes of the target bitstring i-j. It is easy to see a huge difference 
from 1 to 2 writes. Although the green line has a strange behavior 
near 50, 000 items stored, the critical distance was much greater 
than with 1 write. From 2 to 5 to 9 rehearsals, the critical distance 
starts growing rapidly and slows down near six writes. This makes 
sense, since it should have a threshold smaller than 500 bits. 

The 256-dimensional memory has a similar behavior, but less 
abrupt. It keeps growing, but slower than a 100-dimensional 
memory. It never crosses the 50 bits on x-axis in 256 bits, while 
the 1000-dimensional reaches the 200 bits on x-axis and almost 
hits 400 bits on the x-axis. 

These figures display the immense power of reinforcement or 
rehearsal: additional writes of a memory item significantly raise 
the attractor basin (critical distance) for that memory item. 

This behavior is plausible, as the human brain rapidly recog- 
nizes a pattern when it is used to it. Many times, the patterns 
appear in different contexts, giving cues far from the target con- 
cept, much like a chess player, who looks at a position and rapidly 
recognizes what is happening (Bilalic et al, 2009; Rennig et al., 
2013). 

5. DISCUSSION 

This is the first work focused on better understanding the criti- 
cal distance behavior of a Sparse Distributed Memory (Kanerva, 
personal communication). Our future research intends to explore 
the rehearsal mechanisms in cognitive architectures for one of 
the most studied domains of expertise: (Linhares, 2005; Linhares 
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and Brum, 2007; Linhares and Freitas, 2010; Linhares et al, 
2012; Linhares and Chada, 2013; Linhares, 2014), and attempt 
to bridge the low-level world of neurons and their assemblies 
with the high-level world of abstract thought and understand- 
ing of strategic scenarios. We have argued here that, as SDM 
remains both a psychologically plausible and a neuroscientifically 
plausible model of human memory, the study of its critical dis- 
tance may provide insights into the edges of our own recollection. 
Without a precise understanding of the critical distance behavior, 
one cannot advance the theoretical model. Moreover, one can- 
not develop robust applications without knowing the limits of 
convergence. 

The empirical tests shown here confirmed that the critical dis- 
tance in SDM constitutes a "band" wherein both convergence 
and divergence become less and less likely. This is a palatable 
result because, intuitively, the Tip-of-the-Tongue phenomenon in 
humans seems like an attractor, something we sometimes "fall 
into." We argue this is a parallel between SDM and human rec- 
ollection, and posit that our theoretical and empirical results 
provide evidence that the critical distance is a correlate to the edge 
of human recollection. 

While humans sometimes fall into the TotT, there are also 
times when we almost fall into it and, after a bit of effort, are able 
to recall the desired information. In the model, this would mean 
we enter the critical band, but leave it after one or two iterations 
and converge. Likewise, it seems one can be very certain of what 
one is saying and, in mid-sentence, completely diverge from the 
next piece of information we wished to recall. In SDM this would 
amount to entering the critical band, but then diverging. 

As Figure 6 shows, the speed of convergence is a function of 
the number of read operations: additional read operations bring 
one closer to the memory item (assuming that the original cue 
was not past the critical point). We also see that this effect is 
greatly reduced after 6 to 10 read operations. As Figure 7 shows, 
expertise can be correlated with providing additional writes to 
the memory, and we show that increasing the rehearsal num- 
ber greatly increases the margin for error or ambiguity, and greatly 
decreases the relevant information needed for convergence, as the 
critical threshold is increased. In human terms, experts "know 
what you are talking about" with fewer cues. Their memory has 
much greater robustness. 

Yet, it is the combination of these two dynamics that sheds light 
on experts' speed. Taking the SDM model as a plausible account 
of human memory, we can compare by saying that, for experts, 
having a much higher threshold may signify being able to converge 
within fewer, or even a single, read operation. As the hard locations 
have been reinforced with the original information, read oper- 
ations converge faster. With very few cues and noisy, ambiguous, 
information, experts may still manage to recollect and understand 
almost immediately the object, situation, or event in question. It is 
no wonder Albert Einstein could immediately grasp Leo Szilard's 
concerns. 

5.1. DATA SHARING 

All the computational methods developed in this study are avail- 
able as an open-source project, and can be found at https:// 
github . com/msbrogli/sdm. 
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